1. Introduction

This vignette showed the applications of TF activity that estimated from single-cell RNA sequencing data (Smart-seq2) during mouse HSC formation (Zhou et.al. 2018 & Wang et.al. 2022).

1.1 Data introduction

The single-cell RNA-sequencing data including 6 sequential cell populations during HSC development (aortic endothelial cells (AECs) and HSC-primed hemogenic-endothelial cells (HECs) from E10 aorta-gonad-mesonephros (AGM) region, T1 and T2 pre-HSCs from E11 AGM, E12 & E14 fetal liver(FL) HSCs, and HSCs from adult bone marrow).

The TF activity estimation followed with the workflow at metaRegulon vignette. Here, we don’t repeat the GRN inference and TF activity estimation steps, but showing the following steps such as cell-type-specificity analysis.

1.2 Loading data and generating objects.

The raw expression matrix and TF activity data can be downloaded from Zenodo.

suppressMessages(suppressWarnings(library(PUIC)))
suppressMessages(suppressWarnings(library(scATFR)))
suppressMessages(suppressWarnings(library(metaRegulon)))
atfr <- readRDS("D:/HSC_Autophagy/Markdown/5.metaRegulon/metaRegulon/inst/extdata/HSC_development/atfr.rds")

Next, we can extract the TF activity object act_sce that deposit in alternative assays from the atfr object for the following analysis.

act_sce <- altExp(x = atfr, e = "viper")
class(act_sce)
## [1] "SingleCellExperiment"
## attr(,"package")
## [1] "SingleCellExperiment"

Now, lets take a look at the values in the act_sce object:

assay(act_sce)[1:5,1:5]
##               E10.0_AEC_1_1 E10.0_AEC_1_2 E10.0_AEC_1_3 E10.0_AEC_1_4
## 2700081O15Rik     2.6691509     3.4384611      3.519365     2.8035412
## Adnp              0.9074099     1.5317953      1.168628     0.5021225
## Adnp2             0.3717352     0.0147019      1.855357     2.8388917
## Aebp1            -0.4412220    -1.2217474     -0.107998    -0.6754781
## Aebp2             9.1200368    10.7359366     10.183974    10.5986185
##               E10.0_AEC_1_5
## 2700081O15Rik      3.189519
## Adnp               1.525927
## Adnp2              2.124061
## Aebp1             -1.506020
## Aebp2              9.507078

We can see that the values in act_sce have negative values that may affect the following analysis steps, here we performed a liner normalization step to scales the values into 0 to 10.

assay(act_sce) <- t(apply(assay(act_sce),1,function(x){10*(x-min(x))/(max(x)-min(x))}))
assay(act_sce)[1:5,1:5]
##               E10.0_AEC_1_1 E10.0_AEC_1_2 E10.0_AEC_1_3 E10.0_AEC_1_4
## 2700081O15Rik      3.256182     4.5867276      4.726653      3.488614
## Adnp               5.732479     7.3491813      6.408844      4.683081
## Adnp2              1.642584     0.7662129      5.284269      7.698447
## Aebp1              8.256700     6.4896524      9.011094      7.726363
## Aebp2              4.952973     6.2213735      5.788111      6.113586
##               E10.0_AEC_1_5
## 2700081O15Rik      4.156174
## Adnp               7.333986
## Adnp2              5.943830
## Aebp1              5.846082
## Aebp2              5.256781

Then, we can also generate another object exp_sce including the expression of TFs obtained form atfr object.

exp_sce <- atfr[row.names(act_sce),]
exp_sce
## class: SingleCellExperiment 
## dim: 831 154 
## metadata(2): scATFR atfr_version
## assays(2): counts logcounts
## rownames(831): 2700081O15Rik Adnp ... Zxdc Zzz3
## rowData names(0):
## colnames(154): E10.0_AEC_1_1 E10.0_AEC_1_2 ... Adult_HSC_3739
##   Adult_HSC_3740
## colData names(3): sample stage sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(1): viper

2. Comparative analysis of dimensionality reduction

Here we compared the performance of TF expression and TF activity on dimensional reduction analysis

2.1 TF expression level

library(scater)
exp_sce <- runPCA(x=exp_sce, exprs_values="logcounts") 
exp_sce <- runTSNE(x = exp_sce, exprs_values="logcounts")
plotReducedDim(exp_sce, dimred="TSNE",colour_by = "stage")

We noticed that the order of stages is according to alphabet, but this is not our desire order. To custom the order of the stages, we can use reLevel() function:

exp_sce <- reLevel(x = exp_sce, colable="stage", level_order=c("AEC", "HEC", "T1_pre_HSC", "T2_pre_HSC", "FL_HSC", "Adult_HSC"))
p <- plotReducedDim(exp_sce, dimred="TSNE",colour_by = "stage")
p

The parameter colable indicate the column label used for order adjustment, while the level_order indicate the updated order.

2.2 TF activity level

Next, we use TF activity to perform dimensional reduction analysis

act_sce <- reLevel(x = act_sce, colable="stage", level_order=c("AEC", "HEC", "T1_pre_HSC", "T2_pre_HSC", "FL_HSC", "Adult_HSC"))
act_sce <- runPCA(x=act_sce, exprs_values="atfr") 
act_sce <- runTSNE(x = act_sce, exprs_values="atfr")
q <- plotReducedDim(act_sce, dimred="TSNE",colour_by = "stage")
q

To compare the dimensional reduction between TF expression and TF activity and save the results, one can use the following codes:

ggplot2::ggsave(filename ="tf_expression_vs_activity_tSNE_res.pdf" ,plot = p+q, device = "pdf", 
       path = "./", width = 150, height =50, dpi = 600, limitsize = FALSE, units ="mm")

2.3 Indicated TF expression vs activity

Here we introduce the demonstration of TFs at expression or activity level with tSNE plots. Since the dimensional reduction results of TF expression and TF activity is quit different, we use the tSNE result of TF activity as the coordinate. Firstly, we add logcounts matrix from exp_sce into object act_sce:

assay(act_sce,"logcounts") <- assay(exp_sce,"logcounts")[row.names(act_sce),]

Then, we can compare the TF expression and TF activity of the key TFs as follow:

i <- "Hes1"
p1 <- plotReducedDim(object = act_sce,dimred = "TSNE", colour_by = i, by_exprs_values = "logcounts", point_size=0.5, point_alpha=1)
p2 <- plotReducedDim(object = act_sce,dimred = "TSNE", colour_by = i, by_exprs_values = "atfr", point_size=0.5, point_alpha=1)
p1+p2

3. Differential activity analysis and cell-type-specificity evaluation

In this section, we are going to give some examples of differential analysis at TF expression and TF activity level. The TF activity can perform nearly all the same analysis as the expression level did.

3.1 Differential activity analysis between two adjacent stages

Here we performed the differential analysis between AECs and HECs at TF activity level using R package scran:

library(scran)
AEC_HEC_diff_act <- findMarkers(x = act_sce, groups=colData(act_sce)$stage, restrict=c("AEC","HEC"))
AEC_HEC_diff_act
## List of length 2
## names(2): AEC HEC

The results of differential activity analysis is a list of Data Frames.

HEC_pos_diff_act <- as.data.frame(AEC_HEC_diff_act$HEC)
HEC_pos_diff_act$Direction <- ifelse(HEC_pos_diff_act$FDR<0.05,ifelse(HEC_pos_diff_act$summary.logFC>0,'up','down'),'not_sig')
HEC_pos_diff_act$Direction <- factor(x = HEC_pos_diff_act$Direction,level=c('up','down','not_sig'))
head(HEC_pos_diff_act)
ABCDEFGHIJ0123456789
 
 
Top
<int>
p.value
<dbl>
FDR
<dbl>
summary.logFC
<dbl>
logFC.AEC
<dbl>
Direction
<fct>
Nfe212.480302e-078.557341e-055.1305345.130534up
Runx122.547060e-078.557341e-055.9877955.987795up
Elf133.363936e-078.557341e-054.6552174.655217up
Bcl11a44.119057e-078.557341e-055.0823315.082331up
Myb58.725999e-071.329558e-045.0444885.044488up
Hif1a69.599699e-071.329558e-043.9503773.950377up

Now, we can use volcano plot to visualize the results as follow:

library(ggplot2)

vol <- ggplot(data = HEC_pos_diff_act, mapping = aes(x = summary.logFC, y=-log10(p.value), color=Direction))+
  geom_point(shape=16)+
  scale_color_manual(values = c("red","blue","gray"))+
  theme_bw()
vol

3.2 cell-type-specific TF comparison between expression and activity level

Here we use findAllCTSRegulons() to calculate the cell-type-specific TFs for all 6 stages at regulon level and expression level.

cts_act <- findAllCTSRegulons(x = act_sce, clust_name = 'stage', use_altExp = FALSE, use_assay = "atfr", nperm = 10000, ncores=4)
cts_exp <- findAllCTSRegulons(x = exp_sce, clust_name = 'stage', use_altExp = FALSE, use_assay = "logcounts",nperm = 10000, ncores=4)

Please note that if you want to calculate the cell-type-specific TFs for only few stages, you can use findCellTypeSignature() to calculate them. For example, To calculate the TFs that specifically activated in AEC:

AEC_cts_act <- findCellTypeSignature(x = act_sce, pos_clust="AEC", clust_name = 'stage',use_altExp = FALSE,nperm=10000,ncores=4)

Next, we want to identify TFs that only showed cell-type-specificity at activity level but not expression level:

library(purrr)
library(dplyr)
cts_act_list <- split(cts_act,cts_act$stage)
cts_exp_list <- split(cts_exp,cts_exp$stage)
cts_act_only <- purrr::map2(.x = cts_act_list, .y = cts_exp_list,.f = function(x,y){
  x %>% filter(!(regulon %in% y$regulon))
})
cts_act_only <- do.call(rbind,cts_act_only)
head(cts_act_only)
ABCDEFGHIJ0123456789
stage
<chr>
regulon
<chr>
pos_score
<dbl>
neg_score
<dbl>
ave_diff
<dbl>
CSS
<dbl>
pval
<dbl>
padj
<dbl>
Adult_HSCCebpb7.5078392.9973254.5105140.461563200
Adult_HSCPgr7.6131863.0719524.5412340.459834600
Adult_HSCZfp3827.8226063.1826854.6399210.459147200
Adult_HSCHdx5.1240962.0756043.0484920.457404200
Adult_HSCSnai16.8849672.9310693.9538980.451910300
Adult_HSCMlxipl8.0283793.4522684.5761100.450686000

3.3 Visualization of cell-type-specific TFs

To explore the distribution these cell-type-specific TFs, we utilize the RadViz figure to visualize the distribution of these signature genes. Firstly, we calculate the average activity of each TF in every stage using function summarizeClusterValue().

mean_act_score <- summarizeClusterValue(object = act_sce, clust_name = "stage", gene_name = unique(cts_act_only$regulon), use_assay = "atfr")
mean_act_score <- data.frame(gene_name = row.names(mean_act_score), mean_act_score)
head(mean_act_score)
ABCDEFGHIJ0123456789
 
 
gene_name
<chr>
Adult_HSC
<dbl>
AEC
<dbl>
FL_HSC
<dbl>
HEC
<dbl>
T1_pre_HSC
<dbl>
T2_pre_HSC
<dbl>
CebpbCebpb7.5078393.3889913.5759333.3674011.3920663.228716
PgrPgr7.6131864.4641973.6015943.1147281.4371192.904680
Zfp382Zfp3827.8226064.1678434.0570072.7143601.7886043.046008
HdxHdx5.1240961.7791722.9056421.2409811.7217822.278481
Snai1Snai16.8849674.6093863.2389353.2057871.6550942.140615
MlxiplMlxipl8.0283793.2383403.8954413.5086662.6810853.775979

Then, we use the hexagonal density plot to visualize the distribution of cell-type-specific TFs as follow:

cts_act_frm <- left_join(x = cts_act_only, y = mean_act_score,by=c("regulon"="gene_name"))
stages <- c("AEC", "HEC", "T1_pre_HSC", "T2_pre_HSC","FL_HSC","Adult_HSC")
cts_act_frm$stage <- factor(cts_act_frm$stage,levels=stages)
rv1 <- RadvizEnrichPlot(x = cts_act_frm, color_by='stage', anchors = stages, optim_anchor = F, recenter = stages, text_size = 4, plot_type = "hexagonal")
rv1

From the above result, we can see that most of the cell-type-specific TFs only at activity level fall into fetal liver HSCs and adult bone marrow HSCs. Next, to further explore the disribution of these cell-type-specific TFs, we colored the points with their cell-type-specificity label.

rv2 <- RadvizEnrichPlot(x = cts_act_frm, color_by='stage', anchors = stages, optim_anchor = F, recenter = stages, text_size = 4)
rv2

In the above figure, we found that the cell-type-specificity label showed relative high correlation with their location. Note that the rv generate from RadvizEnrichPlot() function is a ggplot object, which means that you can modify this object with functions in ggplot2 package (e.g. adding gene labels).

candidates <- c("Sox6","Ddit3","Cebpb")
names(candidates) <- candidates
cts_act_frm$gene_label <- candidates[cts_act_frm$regulon]

rv2 <- RadvizEnrichPlot(x = cts_act_frm, color_by='stage', anchors = stages, optim_anchor = F, recenter = stages, text_size = 4)+
  geom_text(mapping = aes(label=gene_label),na.rm = TRUE)
rv2

Besides, we noticed that some of the above TFs showed cell-type-specificity in more than one cell types. To better describe cell-type-specific distribution among these cell types, we can also use Venn diagram to visualize these TFs:

library(UpSetR)
## 
## 载入程辑包:'UpSetR'
## The following object is masked from 'package:lattice':
## 
##     histogram
cts_act_only_stat <- table(cts_act_only$regulon,cts_act_only$stage)
attributes(cts_act_only_stat)$class <- "matrix"
data_set <- as.data.frame(cts_act_only_stat)
data_set <- data_set[,stages]

setR <- upset(data_set, nsets = 6, nintersects = 40, mb.ratio = c(0.5, 0.5),
      order.by = c("degree","freq"), decreasing = c(F,TRUE), point.size=5,
      line.size=2)
setR

4. Enrichment analysis of regulons

Since a TF regulon includes a group of targets, which may participate some pathways. We use Jaccard test to estimate the similarity between regulons and pathways as follow:

data(Reactome_pathway)

mm_pathway <- Reactome_pathway$`Mus musculus`
cts_act_HEC <- cts_act_frm %>% filter(stage %in% "HEC")
cts_act_HEC_regulon <- as.list(Regulon(atfr)[cts_act_HEC$regulon])
enrich_res <- calcuModuleSimilar(tfr_list = cts_act_HEC_regulon, pathway_list = mm_pathway,fdr = 0.01, min_overlap = 10, ncores = 4)
head(enrich_res)
ABCDEFGHIJ0123456789
regulon
<chr>
pathway
<chr>
similarity
<dbl>
pval
<dbl>
padj
<dbl>
Zfp322aMetabolism0.016902701.796942e-057.773865e-03
Srebf1Metabolism of lipids0.022759602.895714e-055.424638e-03
Ncoa2Metabolism of RNA0.028532613.619974e-052.034425e-03
Ncoa2Translation0.028639626.451759e-064.532361e-04
Fli1Fc epsilon receptor (FCERI) signaling0.022222225.035663e-055.660085e-03
Fli1Metabolism of RNA0.045810062.183843e-102.454639e-07

Since one regulon may enriched with multiple pathways, we merge them by the number of shared genes among every two pathways and set the pathway term with smallest p-value as the representative term. Meanwhile, we can use Sankey diagram to visualize the relationships between TFs and enriched pathways.

library(easyalluvial)
library(parcats)
enrich_res <- enrich_res %>% filter(firstInGroup == 1)
p = alluvial_wide(enrich_res, max_variables = 2, fill_by = "first_variable",
                  colorful_fill_variable_stratum=F,col_vector_value=NA)
parcats(p, marginal_histograms = FALSE, data_input = data, height=500,width = 500)
Zfp91regulonZfp322aSrebf2Srebf1Safb2Nrf1Ncoa2MazGtf2bFli1Etv6Etv4ErgElf4Elf2E2f3E2f2E2f1Aebp2TranslationpathwayTranscriptional Regulation by TP53Toll Like Receptor 9 (TLR9) CascadeSRP-dependent cotranslational protein targeting to membraneRegulation of TP53 ActivityRegulation of mRNA stability by proteins that bind AU-rich elementsProcessing of Capped Intron-Containing Pre-mRNAmRNA SplicingMetabolism of RNAMetabolism of lipidsMetabolismGene expression (Transcription)DNA ReplicationDeadenylation-dependent mRNA decayCell Cycle, MitoticCell Cycle CheckpointsCell CycleAutophagyActivation of the pre-replicative complexActivation of the mRNA upon binding of the cap-binding complex and eIFs, and subsequent binding to 43S

We use the TF symbols to represent the regulons and we can easily visualize the pathways of the regulons.

sessionInfo

sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=Chinese (Simplified)_China.936 
## [2] LC_CTYPE=Chinese (Simplified)_China.936   
## [3] LC_MONETARY=Chinese (Simplified)_China.936
## [4] LC_NUMERIC=C                              
## [5] LC_TIME=Chinese (Simplified)_China.936    
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] parcats_0.0.2               easyalluvial_0.3.0         
##  [3] UpSetR_1.4.0                scran_1.20.1               
##  [5] metaRegulon_0.1.0           scATFR_0.1.9               
##  [7] stringr_1.4.0               progress_1.2.2             
##  [9] kSamples_1.2-9              SuppDists_1.1-9.7          
## [11] glmnet_4.1-3                Matrix_1.3-4               
## [13] factoextra_1.0.7            cvTools_0.3.2              
## [15] robustbase_0.93-9           lattice_0.20-44            
## [17] Radviz_0.9.2                ROCR_1.0-11                
## [19] AUCell_1.14.0               scater_1.20.1              
## [21] ggplot2_3.3.5               scuttle_1.2.0              
## [23] jaccard_0.1.0               qvalue_2.24.0              
## [25] dplyr_1.0.8                 ppcor_1.1                  
## [27] MASS_7.3-54                 scLink_1.0.1               
## [29] glasso_1.11                 RcisTarget_1.12.0          
## [31] fgsea_1.18.0                SeuratObject_4.0.4         
## [33] Seurat_4.1.0                GENIE3_1.14.0              
## [35] SingleCellExperiment_1.14.1 SummarizedExperiment_1.22.0
## [37] Biobase_2.52.0              GenomicRanges_1.44.0       
## [39] GenomeInfoDb_1.28.1         IRanges_2.26.0             
## [41] S4Vectors_0.30.0            BiocGenerics_0.38.0        
## [43] MatrixGenerics_1.4.0        matrixStats_0.59.0         
## [45] PUIC_0.1.0                  reshape2_1.4.4             
## [47] minet_3.50.0                pbapply_1.5-0              
## [49] purrr_0.3.4                
## 
## loaded via a namespace (and not attached):
##   [1] scattermore_0.8           R.methodsS3_1.8.1        
##   [3] tidyr_1.2.0               bit64_4.0.5              
##   [5] knitr_1.33                irlba_2.3.3              
##   [7] DelayedArray_0.18.0       R.utils_2.10.1           
##   [9] data.table_1.14.0         rpart_4.1-15             
##  [11] KEGGREST_1.32.0           RCurl_1.98-1.3           
##  [13] generics_0.1.2            ScaledMatrix_1.0.0       
##  [15] cowplot_1.1.1             RSQLite_2.2.7            
##  [17] RANN_2.6.1                future_1.24.0            
##  [19] bit_4.0.4                 lubridate_1.7.10         
##  [21] spatstat.data_2.1-2       httpuv_1.6.3             
##  [23] assertthat_0.2.1          viridis_0.6.2            
##  [25] gower_0.2.2               xfun_0.24                
##  [27] hms_1.1.1                 jquerylib_0.1.4          
##  [29] evaluate_0.15             promises_1.2.0.1         
##  [31] DEoptimR_1.0-10           fansi_0.5.0              
##  [33] igraph_1.2.6              DBI_1.1.1                
##  [35] htmlwidgets_1.5.4         spatstat.geom_2.3-2      
##  [37] ellipsis_0.3.2            annotate_1.70.0          
##  [39] deldir_1.0-6              sparseMatrixStats_1.4.0  
##  [41] vctrs_0.3.8               ggalluvial_0.12.3        
##  [43] abind_1.4-5               cachem_1.0.5             
##  [45] withr_2.5.0               progressr_0.8.0          
##  [47] sctransform_0.3.3         prettyunits_1.1.1        
##  [49] goftest_1.2-3             cluster_2.1.2            
##  [51] lazyeval_0.2.2            crayon_1.5.1             
##  [53] arrow_4.0.1               edgeR_3.34.0             
##  [55] recipes_0.1.16            pkgconfig_2.0.3          
##  [57] labeling_0.4.2            nlme_3.1-152             
##  [59] vipor_0.4.5               nnet_7.3-16              
##  [61] rlang_1.0.2               globals_0.14.0           
##  [63] lifecycle_1.0.1           miniUI_0.1.1.1           
##  [65] rsvd_1.0.5                polyclip_1.10-0          
##  [67] lmtest_0.9-38             graph_1.70.0             
##  [69] zoo_1.8-9                 beeswarm_0.4.0           
##  [71] ggridges_0.5.3            png_0.1-7                
##  [73] viridisLite_0.4.0         bitops_1.0-7             
##  [75] R.oo_1.24.0               KernSmooth_2.23-20       
##  [77] Biostrings_2.60.1         blob_1.2.1               
##  [79] DelayedMatrixStats_1.14.0 shape_1.4.6              
##  [81] parallelly_1.30.0         spatstat.random_2.1-0    
##  [83] beachmat_2.8.0            scales_1.1.1             
##  [85] memoise_2.0.0             GSEABase_1.54.0          
##  [87] magrittr_2.0.1            plyr_1.8.6               
##  [89] hexbin_1.28.2             ica_1.0-2                
##  [91] zlibbioc_1.38.0           compiler_4.1.1           
##  [93] dqrng_0.3.0               RColorBrewer_1.1-2       
##  [95] fitdistrplus_1.1-8        cli_3.0.0                
##  [97] XVector_0.32.0            listenv_0.8.0            
##  [99] patchwork_1.1.1           mgcv_1.8-36              
## [101] tidyselect_1.1.2          stringi_1.6.2            
## [103] forcats_0.5.1             highr_0.9                
## [105] yaml_2.2.1                BiocSingular_1.8.1       
## [107] locfit_1.5-9.4            ggrepel_0.9.1            
## [109] grid_4.1.1                sass_0.4.1               
## [111] fastmatch_1.1-0           tools_4.1.1              
## [113] future.apply_1.8.1        rstudioapi_0.13          
## [115] bluster_1.2.1             foreach_1.5.2            
## [117] metapod_1.0.0             gridExtra_2.3            
## [119] prodlim_2019.11.13        farver_2.1.0             
## [121] Rtsne_0.15                digest_0.6.27            
## [123] lava_1.6.10               shiny_1.7.1              
## [125] Rcpp_1.0.7                later_1.3.0              
## [127] RcppAnnoy_0.0.19          httr_1.4.2               
## [129] AnnotationDbi_1.54.1      colorspace_2.0-2         
## [131] XML_3.99-0.6              tensor_1.5               
## [133] reticulate_1.20           splines_4.1.1            
## [135] uwot_0.1.11               statmod_1.4.36           
## [137] spatstat.utils_2.3-0      plotly_4.10.0            
## [139] xtable_1.8-4              jsonlite_1.7.2           
## [141] timeDate_3043.102         ipred_0.9-11             
## [143] R6_2.5.1                  pillar_1.7.0             
## [145] htmltools_0.5.2           mime_0.12                
## [147] glue_1.4.2                fastmap_1.1.0            
## [149] BiocParallel_1.26.1       BiocNeighbors_1.10.0     
## [151] class_7.3-19              codetools_0.2-18         
## [153] utf8_1.2.1                bslib_0.3.1              
## [155] spatstat.sparse_2.1-0     tibble_3.1.2             
## [157] ggbeeswarm_0.6.0          leiden_0.3.9             
## [159] survival_3.2-11           limma_3.48.1             
## [161] rmarkdown_2.9             munsell_0.5.0            
## [163] GenomeInfoDbData_1.2.6    iterators_1.0.14         
## [165] gtable_0.3.0              spatstat.core_2.4-0